Comparaison des méthodes de la fonction expresso() du package affy

load("./dataToComp.RData")
# Gènes en colonnes / lignes = méthodes
data.to.comp = as.data.frame(t(data.to.comp))
data.to.comp

Divisions des méthodes point par point :

On ne change qu’un paramètre à la fois :

paramètres par défaut : - sumstat = mas - norm = “constant” - pm.cor = “mas” - bg.cor = “mas” - DEG : Ranksum

# Liste de toute les méthodes
methods = row.names(data.to.comp)
# Dataframe rempli de valeur binaire (0/1)
upset = Upset.Binary.Dataframe(data.to.comp)

Background correct :

# comparaison pm
methods_bg.cor = upset[grepl("background",methods)]
methods_bg.cor
upset(methods_bg.cor, 
      sets = c("background mas Up", "background rma Up"), 
      sets.bar.color = "#56B4E9", 
      order.by = "freq", 
      )

upset(methods_bg.cor, 
      sets = c("background mas Down", "background rma Down"), 
      sets.bar.color = "#56B4E9", 
      order.by = "freq", 
      )

PM correction :

la méthode “subtractmm” est éliminée (produit un jeu de données remplie de ‘NAs’)

methods_pm.cor = upset[grepl("pm.cor",methods)]

upset(methods_pm.cor, 
      sets = colnames(methods_pm.cor)[grepl("Up",colnames(methods_pm.cor))], 
      sets.bar.color = "#56B4E9", 
      order.by = "freq", 
      )

upset(methods_pm.cor, 
      sets = colnames(methods_pm.cor)[grepl("Down",colnames(methods_pm.cor))], 
      sets.bar.color = "#56B4E9", 
      order.by = "freq", 
      )

Express summary Stat :

  • medianpolish : fait un log2() des valeurs
  • playerout : boucle infinie ?
methods_sumstat = upset[grepl("sumstat",methods)]


upset(methods_sumstat, 
      sets = colnames(methods_sumstat)[grepl("Up",colnames(methods_sumstat))], 
      sets.bar.color = "#56B4E9" 
      #,order.by = "freq", 
      )

upset(methods_sumstat, 
      sets = colnames(methods_sumstat)[grepl("Down",colnames(methods_sumstat))], 
      sets.bar.color = "#56B4E9"
      #,order.by = "freq", 
      )

Méthodes de normalisation :

methods_norm = upset[grepl("norm",methods)]

upset(methods_norm, 
      sets = colnames(methods_norm)[grepl("Up",colnames(methods_norm))], 
      sets.bar.color = "#56B4E9" 
      ,order.by = "freq", 
      )

upset(methods_norm, 
      sets = colnames(methods_norm)[grepl("Down",colnames(methods_norm))], 
      sets.bar.color = "#56B4E9"
      ,order.by = "freq", 
      )

Faire un clustering sur arbre

# 2 jeux de données : Normalisation sur Surexprimé et SousExprimé
Norm = as.data.frame(t(data.to.comp))
Norm = Norm[grepl("norm",methods)]
Up = as.data.frame(t(Norm[grepl("Up",colnames(Norm))]))
Down = as.data.frame(t(Norm[grepl("Down",colnames(Norm))]))

# PCA
PCA_tools(Up)

PCA_tools(Down)

Clustering sur gènes Surexprimés

d <- dist(Up, method = "euclidean") # distance matrix
fit <- hclust(d, method="ward.D") 
plot(fit) # display dendogram
rect.hclust(fit, k=3, border="red") 

# Clustering sur gènes Sous-exprimés

d <- dist(Down, method = "euclidean") # distance matrix
fit <- hclust(d, method="ward.D") 
plot(fit) # display dendogram
rect.hclust(fit, k=3, border="red") 

Conclusion : n’en garder que 3 ? - quantiles / quantiles robust - Invariantset - Constant / loess / contrast / qspline

Vision Globale sur le jeu de données

UpDown = as.data.frame(t(data.to.comp))
Upreg = UpDown[grepl("Up",methods)]
Downreg = UpDown[grepl("Down",methods)]

bg = Upreg[grepl("background",names(Upreg))]
pm = Upreg[grepl("pm.cor",names(Upreg))]
sumstat = Upreg[grepl("sumstat",names(Upreg))]
Norm = Upreg[grepl("norm",names(Upreg))]
PCA_tools(t(Upreg))

PCA_tools(t(Downreg))

group = c(1,1,2,2,3,3,3,4,4,4,4,4,4,4)

res.pca <- PCA(t(Upreg),graph=F)
pca.res = dudi.pca(t(Upreg), scale= T, scannf = FALSE)
s.class(pca.res$li,fac=as.factor(group),col = c("#00AFBB", "#E7B800", "#FC4E07", "#000000"),
        label = c("bg.cor", "pm.cor", "sumstat","norm"),)

d <- dist(t(Upreg), method = "euclidean") # distance matrix
fit <- hclust(d, method="ward.D") 
plot(fit) # display dendogram
rect.hclust(fit, k=4, border="red")